JUBE Interoperability

JUBE is a benchmark automation tool developed by the Julich Supercomputing Centre.

You can find the github repo here: https://github.com/FZJ-JSC/JUBE

And the website here: https://apps.fz-juelich.de/jsc/jube/docu/index.html

JUBETemplates

JUBE defines the machines on which it operates via “platform.xml” files and job templates. remotemanager provides iteroperability with these definitions via the JUBETemplate module, which is avaiable at remotemanager.JUBEInterop

This object provides a modified from_repo which is able to pull these files automatically.

Note

By default from_repo is poined at the JUBE4MaX repository, however you can change this via the repo argument. (Point it at the root of the repo).

You should then specify a path to the directory containing the platform and template files.

The target for these files can be changed by updating platform_name and template_name

Since the file names are likely to clash when pulling multiple machines, they are stored by default in a directory extracted from the name. You can also set this parameter with local_dir.

[2]:
from remotemanager.JUBEInterop import JUBETemplate

template = JUBETemplate.from_repo(path="max-inputs/platforms/cineca/leonardo/booster", local_dir="temp_platform_store")
searching for platform.xml & submit.job at https://gitlab.com/max-centre/JUBE4MaX/-/raw/develop/max-inputs/platforms/cineca/leonardo/booster
Grabbed file 'temp_platform_store/platform.xml'
Grabbed file 'temp_platform_store/submit.job'

After a successful file collection, you will now be able to generate jobscripts using this computer. Lets set some basic arguments and print a sample.

[3]:
template.accountno = "test_acc"
template.nodes = 24
template.ncpus = 128
template.ncores = 128
template.taskspernode = 32

template.executable = "bigdft"
[4]:
script = template.script()
print(script)
#!/bin/bash
#SBATCH --nodes=24
#SBATCH --ntasks-per-node=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4
#SBATCH --time=24:00:00
#SBATCH --exclusive
#SBATCH --account=#ACCOUNT_NO#
#SBATCH --partition=boost_usr_prod
#SBATCH --qos=normal

module purge
export OMP_NUM_THREADS=1

scontrol show jobid -dd $SLURM_JOB_ID > scontrol.out




sacct -j $SLURM_JOB_ID --long > sacct.out

touch #READY#

Parameterisation

Since these platforms are intended to be used within the JUBE infrastructure, you will need to be careful to set the correct parameters. If you’re not sure which parameters to set, you can check the downloaded files, or print the arguments property.

[5]:
print(template.arguments)
['jube_benchmark_name', 'queue', 'timelimit', 'starter', 'args_starter', 'measurement', 'outlogfile', 'errlogfile', 'executable', 'args_executable', 'touch $ready_file', 'nodes', 'threadspertask', 'taskspernode', 'taskspersocket', 'cpuspertask', 'pe', 'gres', 'accountno', 'qos', 'modules', 'preprocess', 'postprocess', 'wrapname', 'wrappre', 'wrappost', 'ready_file', 'make', 'cc', 'cflags', 'mpi_cc', 'mpi_cxx', 'mpi_f90', 'mpi_f77', 'load_module', 'mapping', 'submit', 'submit_script', 'shared_folder', 'shared_job_info', 'nsocket', 'nodecpus', 'ncores', 'threads', 'env', 'ngpus', 'tasks', 'memnodemachine', 'minmem']

Missing Parameters

By default, when encountering a missing argument for a substitution, BaseComputer will delete the whole line. This is based on the rationale that a jobscript #PRAGMA flag=#argument# is best deleted if empty, as it will cause the job to fail.

JUBETemplate uses the “local” empty behaviour for substitutions by default. This means that any missing parameters are removed “locally”, not globally.

If you look at the earlier example, and compare it with the template, you can see many examples of this. Especially on the mpirun call, arguments are missing, however the line is still present.

Temporary Values

Just like BaseComputer is able to accept “temporary” values within the script() method, so is JUBETemplate

[6]:
print(template.script(nodes=128))
#!/bin/bash
#SBATCH --nodes=128
#SBATCH --ntasks-per-node=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4
#SBATCH --time=24:00:00
#SBATCH --exclusive
#SBATCH --account=#ACCOUNT_NO#
#SBATCH --partition=boost_usr_prod
#SBATCH --qos=boost_qos_bprod

module purge
export OMP_NUM_THREADS=1

scontrol show jobid -dd $SLURM_JOB_ID > scontrol.out




sacct -j $SLURM_JOB_ID --long > sacct.out

touch #READY#

Just like the BaseComputer temporary values, these exist for only a single run.

[7]:
nodes = template.script().split("\n")[1]

print(nodes)
#SBATCH --nodes=24